Reinforcement learning (RL) problems can be challenging without well-shaped rewards. Prior work on provably efficient RL methods generally proposes to address this issue with dedicated exploration strategies. However, another way to tackle this challenge is to reformulate it as a multi-task RL problem, where the task space contains not only the challenging task of interest but also easier tasks that implicitly function as a curriculum. Such a reformulation opens up the possibility of running existing multi-task RL methods as a more efficient alternative to solving a single challenging task from scratch. In this work, we provide a theoretical framework that reformulates a single-task RL problem as a multi-task RL problem defined by a curriculum. Under mild regularity conditions on the curriculum, we show that sequentially solving each task in the multi-task RL problem is more computationally efficient than solving the original single-task problem, without any explicit exploration bonuses or other exploration strategies. We also show that our theoretical insights can be translated into an effective practical learning algorithm that can accelerate curriculum learning on simulated robotic tasks.
translated by 谷歌翻译
Tensor program tuning is a non-convex objective optimization problem, to which search-based approaches have proven to be effective. At the core of the search-based approaches lies the design of the cost model. Though deep learning-based cost models perform significantly better than other methods, they still fall short and suffer from the following problems. First, their feature extraction heavily relies on expert-level domain knowledge in hardware architectures. Even so, the extracted features are often unsatisfactory and require separate considerations for CPUs and GPUs. Second, a cost model trained on one hardware platform usually performs poorly on another, a problem we call cross-hardware unavailability. In order to address these problems, we propose TLP and MTLTLP. TLP is a deep learning-based cost model that facilitates tensor program tuning. Instead of extracting features from the tensor program itself, TLP extracts features from the schedule primitives. We treat schedule primitives as tensor languages. TLP is thus a Tensor Language Processing task. In this way, the task of predicting the tensor program latency through the cost model is transformed into a natural language processing (NLP) regression task. MTL-TLP combines Multi-Task Learning and TLP to cope with the cross-hardware unavailability problem. We incorporate these techniques into the Ansor framework and conduct detailed experiments. Results show that TLP can speed up the average search time by 9.1X and 3.0X on CPU and GPU workloads, respectively, compared to the state-of-the-art implementation. MTL-TLP can achieve a speed-up of 4.7X and 2.9X on CPU and GPU workloads, respectively, using only 7% of the target hardware data.
translated by 谷歌翻译
在本文中,我们介绍了DA $^2 $,这是第一个大型双臂灵敏性吸引数据集,用于生成最佳的双人握把对,用于任意大型对象。该数据集包含大约900万的平行jaw grasps,由6000多个对象生成,每个对象都有各种抓紧敏度度量。此外,我们提出了一个端到端的双臂掌握评估模型,该模型在该数据集的渲染场景上训练。我们利用评估模型作为基准,通过在线分析和真实的机器人实验来显示这一新颖和非平凡数据集的价值。所有数据和相关的代码将在https://sites.google.com/view/da2dataset上开源。
translated by 谷歌翻译
全向图像和视频可以在虚拟现实(VR)环境中提供真实世界场景的沉浸式体验。我们在本文中介绍了一项感知全向图像质量评估(IQA)研究,因为在VR环境下提供良好的经验非常重要。我们首先建立一个全向IQA(OIQA)数据库,其中包括16个源图像和320个失真的图像,这些图像被4种通常遇到的失真类型降解,即JPEG压缩,JPEG2000压缩,高斯模糊和高斯噪声。然后,在VR环境中的OIQA数据库上进行了主观质量评估研究。考虑到人类只能在VR环境中的一个运动中看到场景的一部分,因此视觉注意力变得极为重要。因此,我们还在质量评级实验过程中跟踪头部和眼动数据。原始和扭曲的全向图像,主观质量评级以及头部和眼动数据构成了OIQA数据库。在OIQA数据库上测试了最先进的全参考(FR)IQA测量,并进行了一些与传统IQA不同的新观察结果。
translated by 谷歌翻译
深度神经网络的成功在很大程度上取决于大量高质量注释的数据的可用性,但是这些数据很难或昂贵。由此产生的标签可能是类别不平衡,嘈杂或人类偏见。从不完美注释的数据集中学习无偏分类模型是一项挑战,我们通常会遭受过度拟合或不足的折磨。在这项工作中,我们彻底研究了流行的软马克斯损失和基于保证金的损失,并提供了一种可行的方法来加强通过最大化最小样本余量来限制的概括误差。我们为此目的进一步得出了最佳条件,该条件指示了类原型应锚定的方式。通过理论分析的激励,我们提出了一种简单但有效的方法,即原型锚定学习(PAL),可以轻松地将其纳入各种基于学习的分类方案中以处理不完美的注释。我们通过对合成和现实世界数据集进行广泛的实验来验证PAL对班级不平衡学习和降低噪声学习的有效性。
translated by 谷歌翻译
可扩展的编码,可以适应通道带宽变化,在当今复杂的网络环境中表现良好。然而,现有的可扩展压缩方法面临两个挑战:降低压缩性能和可扩展性不足。在本文中,我们提出了第一所学习的细粒度可扩展图像压缩模型(DeepFGS)来克服上述两个缺点。具体地,我们介绍一个特征分离骨干,将图像信息划分为基本和可伸缩的功能,然后通过信息重新排列策略通过通道重新分配特征通道。以这种方式,我们可以通过一次通过编码来生成连续可扩展的比特流。此外,我们重复使用解码器以降低DeepFGS的参数和计算复杂性。实验表明,我们的DeePFGS优于PSNR和MS-SSIM度量中的所有基于学习的可伸缩图像压缩模型和传统可伸缩图像编解码器。据我们所知,我们的DeePFGS是对学习的细粒度可扩展编码的首次探索,与基于学习的方法相比,实现了最优质的可扩展性。
translated by 谷歌翻译
这项工作提出了一种新的计算框架,用于学习用于真实数据集的明确生成模型。特别地,我们建议在包含多个独立的多维线性子空间组成的特征空间中的多类多维数据分发和{线性判别表示(LDR)}之间学习{\ EM闭环转录}。特别地,我们认为寻求的最佳编码和解码映射可以被配制为编码器和解码器之间的{\ em二手最小游戏的均衡点}。该游戏的自然实用功能是所谓的{\ em速率减少},这是一个简单的信息定理措施,用于特征空间中子空间类似的高斯的混合物之间的距离。我们的配方利用来自控制系统的闭环误差反馈的灵感,避免昂贵的评估和最小化数据空间或特征空间的任意分布之间的近似距离。在很大程度上,这种新的制定统一了自动编码和GaN的概念和益处,并自然将它们扩展到学习多级和多维实际数据的判别和生成}表示的设置。我们对许多基准图像数据集的广泛实验表明了这种新的闭环配方的巨大潜力:在公平的比较下,学习的解码器的视觉质量和编码器的分类性能是竞争力的,并且通常比基于GaN,VAE或基于GaN,VAE或基于GaN,VAE的方法更好的方法两者的组合。我们注意到所以,不同类别的特征在特征空间中明确地映射到大约{em独立的主管子空间};每个类中的不同视觉属性由每个子空间中的{\ em独立主体组件}建模。
translated by 谷歌翻译
In recent years, large amounts of effort have been put into pushing forward the real-world application of dynamic digital human (DDH). However, most current quality assessment research focuses on evaluating static 3D models and usually ignores motion distortions. Therefore, in this paper, we construct a large-scale dynamic digital human quality assessment (DDH-QA) database with diverse motion content as well as multiple distortions to comprehensively study the perceptual quality of DDHs. Both model-based distortion (noise, compression) and motion-based distortion (binding error, motion unnaturalness) are taken into consideration. Ten types of common motion are employed to drive the DDHs and a total of 800 DDHs are generated in the end. Afterward, we render the video sequences of the distorted DDHs as the evaluation media and carry out a well-controlled subjective experiment. Then a benchmark experiment is conducted with the state-of-the-art video quality assessment (VQA) methods and the experimental results show that existing VQA methods are limited in assessing the perceptual loss of DDHs. The database will be made publicly available to facilitate future research.
translated by 谷歌翻译
Data-Free Class Incremental Learning (DFCIL) aims to sequentially learn tasks with access only to data from the current one. DFCIL is of interest because it mitigates concerns about privacy and long-term storage of data, while at the same time alleviating the problem of catastrophic forgetting in incremental learning. In this work, we introduce robust saliency guidance for DFCIL and propose a new framework, which we call RObust Saliency Supervision (ROSS), for mitigating the negative effect of saliency drift. Firstly, we use a teacher-student architecture leveraging low-level tasks to supervise the model with global saliency. We also apply boundary-guided saliency to protect it from drifting across object boundaries at intermediate layers. Finally, we introduce a module for injecting and recovering saliency noise to increase robustness of saliency preservation. Our experiments demonstrate that our method can retain better saliency maps across tasks and achieve state-of-the-art results on the CIFAR-100, Tiny-ImageNet and ImageNet-Subset DFCIL benchmarks. Code will be made publicly available.
translated by 谷歌翻译
Vision Transformers convert images to sequences by slicing them into patches. The size of these patches controls a speed/accuracy tradeoff, with smaller patches leading to higher accuracy at greater computational cost, but changing the patch size typically requires retraining the model. In this paper, we demonstrate that simply randomizing the patch size at training time leads to a single set of weights that performs well across a wide range of patch sizes, making it possible to tailor the model to different compute budgets at deployment time. We extensively evaluate the resulting model, which we call FlexiViT, on a wide range of tasks, including classification, image-text retrieval, open-world detection, panoptic segmentation, and semantic segmentation, concluding that it usually matches, and sometimes outperforms, standard ViT models trained at a single patch size in an otherwise identical setup. Hence, FlexiViT training is a simple drop-in improvement for ViT that makes it easy to add compute-adaptive capabilities to most models relying on a ViT backbone architecture. Code and pre-trained models are available at https://github.com/google-research/big_vision
translated by 谷歌翻译